TOAST results for OAEI 2012
نویسندگان
چکیده
The Tensor-based Ontology Alignment SysTem (TOAST) is a generalpurpose (i.e., domain-unspecific) self-configurable (i.e., requiring no user intervention) ontology matching tool. TOAST is based on one of the first tensor-based approaches to Statistical Relational Learning. Being one of the possible applications of the Statistical Relational Learning framework, TOAST may be seen as a system realizing a probabilistic inference with regard to a single relation only the relation representing the ‘semantic equivalence’ of ontology classes or their properties. Due to the flexibility of the integrated tensor-based representation of heterogeneous data, TOAST is able to learn the semantics equivalence relation on the basis of partial matches data included in a train set. 1 Presentation of the System The Tensor-based Ontology Alignment SysTem (TOAST) presented in this paper is an application of an extended version of our tensor-based approach to Statistical Relational Learning (SRL) referred to as Tensor-based Reflective Relational Learning Framework (TRRLF) [12]. In general, SRL is one of the most intensively investigated problems of Artificial Intelligence. Recently proposed tensor-based SRL methods are widely regarded (e.g., see [9]) as a promising alternative to the commonly used graphical models, such as Bayesian Networks and Markov Logic Networks [2], [5]. To our knowledge, TOAST represents the first tensor-based approach to ontology alignment. We use a 3rd-order tensor as a data structure that is suitable to represent data provided as a set of RDF triples [4], [9]. There are several recent works considering the use of tensors to represent relational data given as RDF triples [5], [9], [4], [11]. The authors of these works assume that the active mode (corresponding to the RDF subject role) and the passive mode (corresponding to the RDF object role) of each entity have to be modeled as two separate tensor modes. However, they do not address the questions of (i) how to model the relation between two modes of the same entity and (ii) how the orientation of this relation (i.e., the setting which entity plays the active and which entity plays the passive role, as far as a given relation is concerned) influences the system performance [9], [4], [11]. We intend to confront these issues by proposing to model data in a way that enables a high level of flexibility for specifying the roles that any pair of entities plays with regard to any relation. Consequently, we represent both the active and passive modes of a given entity as potentially fully independent of each other – it is the correlation of the active mode and the passive mode (observable in the input data) that fully determines the extent to which the vectors representing the modes are algebraically similar to each other. As we have shown in our experiments, the proposed tensor-based representation of relational data (in particular RDF triples), is appropriate for the ontology alignment task. It is worth noting that the internal data representation of TOAST is based on a probabilistic model of a vector space that has so far only been used in quantum Information Retrieval [13]. It should be stressed that TOAST does not require the use of external knowledge sources, such as dictionaries or thesauruses, in order to provide high quality results. However, the use of such knowledge data is possible – it may be realized by converting the data into the subject-predicate-object format [12], as discussed in Section 3. 1.1 State, Purpose, General Statement TOAST is a fairly general-purpose ontology alignment tool. Being a specialized application of our SRL framework (i.e., the TRRL framework), TOAST may be seen as a system realizing a probabilistic inference with regard to a single relation only the relation representing the semantic equivalence of ontology classes or their properties. The TRRL’s flexibility, which is typical of SRL methods, is clearly visible in the propositional representation of all the heterogeneous data provided to the system (including the propositional representation of the occurrence of terms in the labels of the ontology classes). The evaluation of TOAST has focused on the Anatomy track, which belongs to OAEI tracks that involve the use of the most expressive ontologies [3]. For this reason, we have not prepared the TOAST system to parse input data for any OAEI track other than the Anatomy. As a result, the Anatomy test is the only OAEI track test that TOAST passes. On the other hand, it should be noted that, in 2012, TOAST is the only matching system that can exploit additional partial alignments in the Anatomy track. To illustrate this fact, we present an additional experimental evaluation that has been performed, as suggested by the OAEI organizers, with the use of the OAEI 2010 dataset1, in case of which the train set includes partial alignments. We show that, when partial alignments are available, TOAST is able to learn the semantics of all the relations [8], including the matchesTo relation, on the basis of the partial alignments data. It allows the system to exploit ‘a behavioral dimension’ of the alignments modeling and generation [12]. The results of TOAST evaluation presented in this paper are comparable with the results of the leading systems that have been evaluated from the perspective of Subtask #4 of the 2010 Anatomy track edition2. 1 Anatomy 2010 modified dataset: http://oaei.ontologymatching.org/2010/ anatomy/modifications2010.html 2 Anatomy Results of 2010 Evaluation: http://oaei.ontologymatching.org/ 2010/results/anatomy/index.html 1.2 Specific Techniques Used As TOAST is based on an SRL method, all techniques that are used in the system may be regarded as SRL solutions, rather than solutions specific to the ontology matching task. From such a general perspective, TOAST may be seen as a system that exploits a new algebraic data representation and processing method as a means for ontology alignment. Tensor-Based Relational Data Representation The tensor used in TOAST [12] can be seen as tensor product Ti,j,k = [ti,j,k]n×n×m = S ×O ×R of vector spaces whose coordinates correspond to the set of subjects S, the set of objects O, and the set of relations R. We assume that |R| = m and that |S| = |O| = n. Additionally, we define set F as a set of all the known facts (i.e., RDF triples) which are used to build the input tensor. The number |F | = f determines the number of positive cells in the input tensor. Moreover, we define set E = S ∪O ∪R as a set of elements (i.e., subjects, objects, and relations) used in the input data and represented in T by a slice (2nd-order array) of the 3rd-order tensor [12]. Due to the flexibility of the proposed tensor data model, it is possible to integrate the information about the ontology schema structure with the lexical knowledge. Therefore, set F contains facts about the relations between the ontology entities as well as between the ontology entities and the terms (representing lexical information) [12]. r1 r2 rk termOf equalTo matchesTo
منابع مشابه
Evaluating Ontology Matching Systems on Large, Multilingual and Real-world Test Cases
In the field of ontology matching, the most systematic evaluation of matching systems is established by the Ontology Alignment Evaluation Initiative (OAEI), which is an annual campaign for evaluating ontology matching systems organized by different groups of researchers. In this paper, we report on the results of an intermediary OAEI campaign called OAEI 2011.5. The evaluations of this campaign...
متن کاملExploiting the UMLS metathesaurus in the ontology alignment evaluation initiative
In this paper we describe how the UMLS Metathesaurus—the most comprehensive effort for integrating medical thesauri and ontologies—is being used within the context of the Ontology Alignment Evaluation Initiative (OAEI). We also present the obtained results in the Large BioMed track of the OAEI 2011.5 campaign where the reference alignments are based on UMLS. Finally, we propose a new reference ...
متن کاملEffect of Different Lactic Acid Bacteria on Phytic Acid Content and Quality of Whole Wheat Toast Bread
ABSTRACT: Nowadays, consumption of whole flours and flours with high extraction rate is recommended, because of their large amount of fiber, vitamins and minerals. Despite nutritional benefits of whole flours, concentration of some undesirable components such as phytic acid is higher than white flour. In this study, the effect of several sourdough lactic acid bacteria on toast bread was investi...
متن کاملResults of the Ontology Alignment Evaluation Initiative 2012
Ontology matching consists of finding correspondences between semantically related entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. These test cases can use ontologies of different nature (from simple thesauri to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation, consensus. OAEI ...
متن کاملTesting the AgreementMaker System in the Anatomy Task of OAEI 2012
The AgreementMaker system was the leading system in the anatomy task of the Ontology Alignment Evaluation Initiative (OAEI) competition in 2011. While AgreementMaker did not compete in OAEI 2012, here we report on its performance in the 2012 anatomy task, using the same configurations of AgreementMaker submitted to OAEI 2011. Additionally, we also test AgreementMaker using an updated version of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012